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ERGODIC CONVERGENCE OF A STOCHASTIC 
PROXIMAL POINT ALGORITHM* 

PASCAL BIANCHit 


Abstract. The purpose of this paper is to establish the almost sure weak ergodic convergence 
of a sequence of iterates (xn) given by 

^n + l ~ • )) (^n) 

where (A(s, : s £ E) is a collection of maximal monotone operators on a separable Hilbert space, 

(^n) is an independent identically distributed sequence of random variables on E and (A^) is a 
positive sequence in The weighted averaged sequence of iterates is shown to converge weakly 

to a zero (assumed to exist) of the Aumann expectation E(A(^i, .)) under the assumption that the 
latter is maximal. We consider applications to stochastic optimization problems of the form 

m 

minE(/(^l, a;)) w.r.t. a; G Xi 
i=l 

where / is a normal convex integrand and {Xi) is a collection of closed convex sets. In this case, 
the iterations are closely related to a stochastic proximal algorithm recently proposed by Wang and 
Bertsekas. 
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1. Introduction. The proximal point algorithm is a method for finding a zero 
of a maximal monotone operator A : H ^ 2^ on some Hilbert space Ji i.e., a point 
X € H such that 0 G A(a;). The approach dates back to [24] [40] [13] and has aroused 
a vast literature. The algorithm consists in the iterations 

Vn+l ~ (-f T yn 

for n G N where A„ > 0 is a positive step size. When the sequence (A„) is bounded 
away from zero, it was shown in [40] that {yn) converges weakly to some zero of A 
(assumed to exist). The case of vanishing step size was investigated by several authors 
including [13], [31], see also [1]. The condition A„ = +oo is generally unsufficient 
to ensure the weak convergence of the iterates {yn) unless additional assumptions 
on A are made (typically, A must be demi-positive). A counterexample is obtained 
when A is a 7r/2-rotation in the 2D-plane and J2n^n < oo- However, the condition 
J2n = +00 is sufficient to ensure that yn converges weakly in average to a zero 
of A. Here, by weak convergence in average, or weak ergodic convergence, we mean 
that the weighted averaged sequence 

- _ J2k=i ^kyk 

Un \ 

converges weakly to a zero of A. 

This paper extends the above result to the case where the operator A is no longer 
fixed but is replaced at each iteration n by one operator randomly chosen amongst a 
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collection (^(s, .) : s G E) oi maximal monotone operators. We study the random 
sequence (xn) given by 

(1.1) a;„+i = (/+ A„A(^„+i, . 

where (^„) is an independent identically distributed sequence with probability dis¬ 
tribution on some probability space (17, P). We refer to the above iterations as 

the stochastic proximal point algorithm. Under mild assumptions on the collection 
of operators, the random sequence {Xn) generated by the algorithm is shown to be 
bounded with probability one. The main result is that almost surely, (Xn) converges 
weakly in average to some random point within the set of zeroes (assumed non-empty) 
of the mean operator A defined by 

A(s, x)dfj,{s) 

where f represents the Aumann integral [5, Chapter 8]. While the operator A is 
always monotone, our key assumption is that it is also maximal. This condition is 
satisfied in a number of particular cases. For instance when the random variable 
belongs almost surely to a finite set, say {1,..., rn},A(x) coincides with the Minkowski 
sum 


A : X 1-1^ 


A(x) = J^P(^i=i)A(z,x) 

1=1 

for every x G 'H, and A is maximal under the sufficient condition that the interiors of 
the domains of all operators A{i, .) (i = 1,..., m) have a non-empty intersection [38]. 

Related works and applications. In the literature, numerous works have been 
devoted to iterative algorithms searching for zeroes of a sum of maximal operators. 
One of the most celebrated approach is the Douglas-Rachford algorithm analyzed 
by [23]. Though suited to a sum of two operators, the Douglas-Rachford algorithm 
can be adapted to an arbitrary finite sum using the so-called product space trick. The 
authors of [13] and [31] consider applying product of resolvents in a cyclic manner. 
Numerically, the above deterministic approaches become difficult to implement when 
the number of operators in the sum is large, or a fortiori infinite {i.e. the mean 
operator is an integral). In parallel, stochastic approximation techniques have been 
developped in the statistical literature to find a root of an integral functional h : 
H ^ TL oi the form h{x) = f H{s,x)dp,{s). The archetypal algorithm writes Xn+i = 
Xn — ^nH{^n+i,Xn) as proposed in the seminal work of Robbins and Monro [32]. It 
turns out that the iterates (1.1) have a similar form 


Xn+l — Xn ^nAx^(^^n+l ^ Xn') 

where Ax (s, .) is the so-called Yosida approximation of the monotone operator A(s, .). 
As a matter of fact, our analysis borrows some proof ideas from the stochastic ap¬ 
proximation literature [2]. 

Applications of stochastic approximation include the minimization of integral 
functionals of the form x i-5> E(/(^i, x)) where (/(s, .) : s G E) is a collection of proper 
lower-semicontinuous convex functions onH ^ (—oo, -boo]. We refer to [28] or to [10] 
for a survey. In particular, the benefits in terms of convergence rate of considering 
average iterates x„ = X]fc<rt Sfe<n established by [28] in the context of 
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convex programming and in [21] in the context of variational inequalities. Averaging 
of the iterates is introduced in these works (see also [3] for more recent results) where 
improved complexity results is the main motivator. For instance, the stochastic sub¬ 
gradient algorithm writes Xn+i = a;„ — A„V/(^ra+i, a;„) where Vf(^n+i,Xn) represents 
a subgradient of f(^n+i, ■) at point Xn (assumed in this case to be everywhere well 
dehned). The algorithm is often analyzed under a uniform boundedness assumption 
of the subgradients [28], [10]. In practice, a reprojection step is often introduced to 
enforce the boundedness of the iterates. 

Denoting by A(s, .) the subdifferential of /(s, .), the resolvent (I -|- AA(s, . ))“^ 
coincides with the proximity operator associated with f(s, . ) given by 

\\t — ^11^ 

(1.2) prox;,y(^ )(a;) = argminA/(s,t) - - - 

for any x The iterations (1.1) can be equivalently written as 

(1-3) Xn+l = prox;^„/(4„+i. .)ixn)- 

A related algorithm is studied (among others) by Bertsekas in [11] under the assump¬ 
tion that has a hnite range and /(s, .) is defined on —>■ R. As functions are 
supposed to have full domain, [11] introduces a projection step onto a closed convex 
set in order to cover the case of constrained minimization. When there exists a con¬ 
stant c such that the functions /(s, .) are c-Lipschitz continuous for all s, and under 
other technical assumptions, the algorithm of [11] is proved to converge to a sought 
minimizer. In [45], the finite range assumption is dropped and random projections 
are introduced. Extension to variational inequalities is considered in [46] (see also the 
discussion below). 

An important aspect is related to the analysis of the convergence rates of the 
iterates (1.3). The working draft [41] was brought to our knowledge during the review 
process of this paper. The authors analyze a related algorithm and provide asymptotic 
convergence rates in the case where the monotone operators A(s, .) are gradients of 
convex functions in R” and assuming moreover that these functions have the same 
domain, are all strongly convex and twice differentiable. 

In order to illustrate (1.1), we provide some application examples without insisting 
on the hypotheses for the moment. 

The simplest application example correspond to the following feasibility problem: 
given a collection of closed convex sets Xi,, Xm, find a point x in their intersection 
X = n™ 1 Ai. The interest lies in the case where X is not known but revealed 
through random realizations of the Ads, so that a straightforward projection onto 
X is unaffordable [27], [7]. The algorithm (1.3) encompasses this case by letting 
fi^n+i, •) coincide with the indicator function of the set (equal to 

zero on that set and to -l-oo elsewhere), where ^n-i-i is randomly chosen in the set 
E = {l,...,m} according to some distribution ^ where all the ads 

are positive and 6i is the Dirac measure at i. In this case, the algorithm (1.3) boils 
down to a special case of [27] and consists in successive projections onto randomly 
selected sets. The algorithm is of particular interest when m is large (our framework 
even encompasses the case of an infinite number of sets) or in the case of distributed 
optimization methods: in that case, Xi is the set of local constraints of an agent i and 
X is nowhere observed [10]. As pointed out in [27], examples of applications include 
fair rate allocation problems in wireless networks where Xi represent a set of channel 
states [18], [20], [43] or image restoration and tomography [14], [17]. 
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A generalization of the above feasibility problem is the programming problem 

m 

(1.4) minF(a:) s.t. a; S O 

X • • 

2=1 


where A is a closed proper convex function. Here we set /(O, .) = F and f{i, .) = iXi 
for 1 < z < TO and choose randomly the variable Cn+r on the set E = {0,1.. .to} 
according to some discrete distribution some positive coefficients oti. 

The use of algorithm (1.3) with / replaced by / leads to an algorithm where either 
prox_)^^p■ is applied to the current estimate or a projection onto one of the sets Xi is 
done, depending on the outcome of ■Cn+i- A refinement consists in assuming that the 
function F is itself an expectation of the form F{x) = E{f{Z,x)) for some random 
variable Z. In this case, the previous algorithm can be extended by substituting 
pvoxx^F with a random version ^ where {Zn)n are iid copies of Z. This 

example will be discussed in details in Section 6. 

Apart from convex minimization problems. Algorithm (1.1) also finds applications 
in minimax problems i.e., when the aim is to search for a saddle point of a given 
function L [15], [37]. Suppose that H is a cartesian product of two Hilbert spaces 
Hi X H 2 and define i : E x H ^ [— 00 , + 00 ] such that £(s, x, y) is convex in x and 
concave in y and £{s, .) is proper and closed in the sense of [37]. Consider the problem 
of finding a saddle point (x, y) of function L = E(£(^i, .)) i.e. (x, y) G arg minimax L. 
For every s € E and z G 72 of the form z = {x, y), define A(s, x, y) as the set of points 
(m,w) such that for every {x',y'), 

^{s, x', y) - {u, x') + {v, y) > £{s, x, y) - {u, x) + {v, y) > 2(s, x, y') - {u, x) + {v, y'). 


In that case, the operator A(s, .) is maximal monotone for every s, and the stochastic 
proximal point algorithm (1.1) reads 


(a;„+i, 2 /„+i) = argminimax 
(x,v) 


£{^n+i,x,y) + 


||X - Xnll^ 
2A„ 


\\y-yuf 

2A„ 


As a further extension. Algorithm (1.1) can be used to solve variational inequali¬ 
ties. Let X = H^iXi be defined as above and consider the problem of finding x* € X 
such that 


(1.5) Vx G X, (A(a;*),x-x*) > 0 

where F : 72 72 is monotone and, for simplicity, single-valued (extension to set¬ 

valued F is also possible in our framework). Applications of (1.5) are numerous. We 
refer to [22] for an overview. Specific applications include game theory where typically, 
a Nash equilibrium has to be found amongst users having individual constraints and 
observing possibly stochastic rewards [42]. Other examples such as matrix minimax 
problems are described in [21]. Similarly to the programming problem (1.4), the ap¬ 
plication of the stochastic proximal point algorithm to the variational inequality (1.5) 
yields the following algorithm. Depending on the outcome of a random variable 
G {0,..., to}, a projection onto one of the sets Xi,..., Xm is performed, or the 
resolvent (I + \nF)~^ is applied to the current estimate. 

Also interesting is the case where the function F in (1.5) is itself defined as an 
expectation of the form F{x) = E(/(Z, cc)) where / is 72-valued and Z is a r.v. In 
this case, the previous algorithm can be generalized by substituting the resolvent 
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(/ + A„F)“^ with its stochastic counterpart (/ + A„/(^„+i, .))~^ where (Z„)„ are 
iid copies of Z. The context of stochastic variational inequalities is investigated by 
Juditsky et al., see [21] where a stochastic mirror-prox algorithm is provided. The 
algorithm of [21] uses general prox-functions and allows for a possible bias in the 
estimation of F. In [21], X is supposed to be a compact subset of m is equal 
to one, and ||T'(a:) — F{y)\\^, < L\\x — y\\ + M (for some arbitrary norm || . || and the 
corresponding dual norm || . ]]*) where L, M are constants that are known by the user. 
Moreover, a variance bound of the form E{\\f{Z, x) — F{x)\\'^) < is supposed to hold 
uniformly in x. Then, using a constant step size depending on L, M and the expected 
number of iterations of the algorithm, the authors prove that the algorithm achieves 
optimal convergence rate. Note that the black-box model used in the present paper 
is different from [21] in the sense that we are making an implicit use of f{Zn+i, .) 
instead of an explicit one as in [21]. In our work, this permits to prove the almost 
sure convergence of the algorithm under weaker assumptions than [21]. On the other 
side, the price to pay with our approach is the absence of convergence rate certificates. 
Also related to our framework is the recent work [46]. An algorithm similar to ours 
is proposed, F being moreover assumed to be strongly monotone and to verify the 
lipschitz-like property E{\\f{Z,x) — f{Z,y)\\'^) < C\\x — y\\^. These assumptions are 
not needed in our approach. 

Organization and contributions. The paper is organized as follows. After 
some preliminaries in Section 2, the main algorithm is introduced in Section 3. The 
aim of Section 4 is to establish that the algorithm is stable in the sense that the 
sequence (xn) is bounded almost surely. We actually prove a stronger result: for any 
zero X* of A, the sequence ||a:„ — x*]] converges almost surely. This point is the first 
key element to prove the weak convergence in average of the algorithm. The second 
element is provided in Section 5 where it is shown that any weak cluster point of the 
weighted averaged sequence (x„) is a zero of A. Putting together these two arguments 
and using Opial’s lemma [31], we conclude that, almost surely, (xn) converges weakly 
to a zero of A. The proofs of Section 5 rely on two major assumptions. First, the 
operator A is assumed maximal, as discussed above. Second, the averaged sequence of 
(random) Yosida approximations evaluated at the iterates is supposed to be uniformly 
integrable with probability one. The latter assumption is easily verifiable when all 
operators are supposed to have the same domain. The case where operators have 
different domains is more involved. We introduce a linear regularity assumption of 
the set of domains of the operators inspired by [7] (a similar assumption is also 
used in [45]). We provide estimates of the distance between the iterate x„ and the 
essential intersection of the domains. The latter estimates allow to verify the uniform 
integrability condition, and yield the almost sure weak convergence in average of the 
algorithm in the general case. 

In Section 6, we study applications to convex programming. We use our results 
to prove weak convergence in average of {xn) given by (1.3) to a minimizer of x i—>■ 
E(/(^i,x)). As an illustration, we address the problem 

m 

min E(/(^i,x)) w.r.t. x S P| W 

i=l 

where Xi, ..., Xm are closed convex sets of and /(s, .) is a convex function on 
H —>■ R for each s € E. We propose a random algorithm quite similar to [45] and 
whose convergence in average can be established under verifiable conditions. 
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2. Preliminaries. 

Random closed sets. Let H be a separable Hilbert space (identified with its 
dual) equipped with its Borel cr-algebra B{H). We denote by ||a;|| the Euclidean norm 
of any x & % and by d{x,Q) = inf{||y — a;|| : y G Q} the distance between a point 
X GTi and a set Q G 2^ (equal to +oo when (5 = 0). We denote by cl((5) the closure 
of Q. We note \Q\ = sup{||a;|| : x G Q}. 

Let (T, T) be a measurable space. Let L : T —2^ be a multifunction such that 
r(t) is a closed set for all t G T. The domain of L is denoted by dom(r) = {t G T : 
r(t) ^ 0}. The graph of T is denoted by gr(r) = {(t, x) : x G r(t)}. 

We say that T is T-measurable (or Effros-measurable) if {t G T : r(t) n C/ 0} G 
T for each open set U C "H. This is equivalent to say that for any x GJi, the mapping 
t !-)• d{x,T(t)) is a random variable [16], [26]. We say that T is graph-measurable if 
gr(r) G TEffros-measurability implies graph measurability and the converse 
is true if (T, T) is complete for some cr-finite measure [16, Chapter III], [26, Theorem 
2.3, pp.28]. 

Given a probability measure v on (T, 7”), a function </> : T —^ is called a 
measurable selection of L if ^ is T /H(7t)-measurable and if G r(t) for all t v- 
a.e. We denote by iS(r) the set of measurable selections of T. If T is measurable, 
the measurable selection theorem states that 5(r) ^ 0 if and only if T{t) ^ 0 for 
all t v-Sk.e. [26, Theorem 2.13, pp.32], [5, Theorem 8.1.3]. For any p > 1, we denote 
by LP{T,'H,h') the set of measurable functions 4> : T ^ T-L such that / < oo. 

We set 4SP(r) = 4S(r) n LP{T,'H, v). The Aumann integral of the measurable map T 
is the set 


J rdiy=I^J (j)dv : (^G<Si(r)| 


where f (pdv \s the Bochner integral of (j>. 

Monotone operators. An operator A ; —>■ 2^ is said monotone if V(a;, j/) G 

gr(A), \/{x', y') G gr(A), {y — y',x — x') > 0. It is said strongly monotone with modulus 
a if the inequality {y — y',x — x') > 0 can be replaced by (y — y',x — x') > a||a; — cc'jp. 
The operator A is maximal monotone if it is monotone and if for any other monotone 
operator A' : 77 —2^, gr(A) C gr(A') implies A = A'. A maximal monotone operator 
A has closed convex images and gr(A) is closed [9, pp. 300]. We denote the identity 
by / •. X ^ X. For some A > 0, the resolvent of A is the operator J;), = (/ -l- AA)~^ or 
equivalently: y G JA(a^) if and only if {x — y)/\ G f^{y)- The Yosida approximation of A 
is the operator Aa = (/ — Ja) /A. Assume from now on that A is a maximal monotone 
operator. Then Ja is a single valued map on 77 77 and is firmly non-expansive in 

the sense that (JA(a:) — Sx{y),x — y) > ||JA(a:) — JA(y)P for every [x,y) G 77^. The 
Yosida approximation Aa is 1 /A-Lipschitz continuous and satisfies k\{x) G k{]\{x)) 
for every x G 77 [25], [9, Corollary 23.10]. For any x G dom(A), we denote by k^ix) 
the element of least norm in k(x) i.e., ko{x) = proj/\( 3 ,)( 0 ) where proj^ represents 
the projection operator onto a closed convex set C. When A is maximal monotone 
and X G dom(A), then jjAA(x)ll < |lAo(x)ll. In that case, AA(a:) and JA(a:) respectively 
converge to Ao(a::) and a: as A 4- 0 [9, Section 23.5]. 

Random convex functions. A function f : E x Td ^ {—oo, -l-oo] is called a 
normal convex integrand if it is 5 0 iB(77)-measurable and if /(s, .) is lower semi- 
continuous proper and convex for each s G E [39]. For such a function /, we define 
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( 2 . 1 ) 


F{x) 


f{s,x)dn{s) 


where the above integral is defined as the sum 


J fis,x)+dn{s) - J f{s,x) dn{s) 

where we use the notation = max(±a, 0) and the convention (+oo) — (+oo) = +oo. 
The subdifFerential operator df ■. E x'H is defined for all {s,x) € E x H by 

9fis, x) = {uGH : Vy G 77, /(s, y) > /(s, x) + {u,y - x)} . 


3. Algorithm. 

3.1. Description. Let {E,£,y,) be a complete probability space and let 77 be a 
separable Hilbert space equipped with its Borel cr-algebra S(77). Consider a mapping 
A : Ex H ^ 2^ and define for any A > 0, the resolvent and the Yosida approximation 
of A as the mappings J\ and A\ respectively defined on 77 x 77 2^ by 

J\{s,x) = (7 + XA{s, . ))~^(x) 

Ax(s,x) = (x - Ja(s,x))/A 

for all (s, cc) G 77 X 77. 

Assumption 1. 

fij For every s G E fi-a.e., A(s, .) is maximal monotone. 

(ii) For any A > 0 and a; G 77, J\(. ,x) is £/ B{T-L)-measurable. 

By [4, Lemme 2.1], the second point is equivalent to the assumption that A is 
£l<8)H(77)-Effros measurable. Also, by the same result, the statement “for any A > 0” 
in Assumption l(ii) can be equivalently replaced by “there exists A > 0”. As A(s, .) is 
maximal monotone, Jx{s., .) is a single-valued continuous map for each s G 77. Thus, 
J\ is a Caratheodory map. As such, Jx is 8 ^ S(77)/S(77)-measurable by [5, Lemma 
8.2.6]. 

Consider an other probability space (H,7^, P) and let (^„ : n G N*) be a sequence 
of random variables on H > 77. For an arbitrary initial point Xq G T-L (assumed fixed 
throughout the paper), we consider the following iterations 

(3.1) Xn+l — JXn(.^n-\-lT ^n) • 

Assumption 2. 

(i) The sequence (A„ : n G N) is positive and belongs to £^\£^. 

(ii) The random sequence (^„ : n G N*) is independent and identically distributed 
with probability distribution p. 

Let Fn be the cr-algebra generated by the r.v. ^i,... We denote by E the 
expectation on (fl, P) and by E„ = E(. \Fn) the conditional expectation w.r.t. 


3.2. Mean operator. For any x € we define Sa{x) = S{A{.,x)) as the 
set of measurable selections of A{.,x). We define similarly S^{x) = 5^’(A(.,x)). 
For each s € E, we set Dg = dom(A(s, .)). Following [19], we define the essential 
intersection (or continuous intersection) of the domains Dg as 

V= [j f| 

seE\N 

where is the set of /r-negligible subsets of E. Otherwise stated, a point x belongs 
to V ii X G Dg for every s outside a negligible set. We define 

A[x) = J A{s,x)dfi{s). 

For any s G E and any x € Dg, we define Ao(s,a^) = (0) as the element of 

least norm in A(s,x). 

Lemma 3.1. Under Assumption 1, A is monotone and has convex values. More¬ 
over, if J ||Ao(s, x)||(i/r(s) < oo for all x GV, then 

dom(A) = V . 


Proof. The first point is clear. For any x G E, Aq{.,x) is well defined p- 
a.e. and is measurable as the pointwise limit of measurable functions Ax{.,x) for 
A 0. By the measurable selection theorem, V = dom(S'^). On the other hand, 
dom(^) = dom(5']4) C V. For any x € X>, Ao(., x) is an integrable selection of A(., x) 
by the standing hypothesis. Thus, x € dom(A). As a consequence, V C dom(A). □ 

Example 1. Consider the case where p is a finitely supported measure, say 
supp(/r) = {1,... ,m} for some integer m > 1. Set Wi = p{{i}) for each i. Then 
A = J2i=i WiA{i, .) and its domain is equal to 

m 

V=f]D,. 

i=l 

Moreover, if the interiors of the respective sets Di, ... ,Dm have a non-empty inter¬ 
section, then A is maximal by [38]. 

Example 2. Set T-L = K'^. Assume A is non-empty valued and for all x G PL, 
|^(■)a^)| Si gi-) for some g G /x). Then A is non-empty (convex) valued and 

has a closed graph by [4-7]. Thus A is maximal monotone by [6, pp. ^5/. 

Example 3. Let f : E x TL ^ (— oo,+oo] be a normal convex integrand and 
assume that its integral functional F given by (2.1) is proper. Then F is convex 
and lower semicontinuous ]44]- A[s,x) = df{s,x). Assume that the interchange 
between expectation and subdifferential operators holds i.e., 

J df{s,x)dp(s) = d J f{s,x)dp{s), 

otherwise stated, A{x) = dF{x). Then, as F is proper convex and lower semicontin¬ 
uous, it follows that A is maximal monotone ]9, Theorem 21.2]. Sufficient conditions 
for the interchange can be found in ]34]. Assume that F(x) < +oo for every x such 
that X G dom/(s, .) p-almost everywhere. Suppose that F is continuous at some point 
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and that the set valued function s i—cl(dom/(s, .)) is constant almost everywhere. 
Then the identity A{x) = dF(x) holds. 

We denote by zer(^) = {x € Ti : 0 G A{x)} the set of zeroes of A. We define for 
each p>\ 


Za (p) = {x GT-L : 3(j) G S^{x) : J (j)dp = 0} . 

For any p >1, Za{p) C ZaA) and -Za(I) = zer(A). 

3.3. Outline of the proofs. Before going into the details, we first provide an 
informal overview of the proof structure without insisting on the hypotheses for the 
moment. 

We start by showing two separate results in Sections 4.1 and 4.2 respectively, 
which we merge in Section 4.3. The first result (Proposition 1) states that almost 
surely, lim„_>oo \\xn — a^*|| exists for every x* G Z^(2). In particular, sequence (xn) 
is bounded with probability one, whenever Za(2) is non-empty. The second result 
(Theorem 1) states the following: when A is maximal, all weak cluster points of the 
averaged sequence (x„) are zeroes of A, almost surely on the event 


(3.2) 


n I—>■ 


Yl,k<n 


is uniformly integrable 


Assuming that zer(A) C 2^a( 2), the above results can be put together by straightfor¬ 
ward application of Opial’s lemma (see Lemma 4.3). Almost surely on the event (3.2), 
(xn) converges weakly to a point in zer(A). The latter result is stated in Theorem 2. 
In order to complete the convergence proof, the aim is therefore to provide verifi¬ 
able conditions under which the event (3.2) is realized almost surely. This point is 
addressed in Section 5. 

Checking that (3.2) holds w.p.l is relatively easy in the special case where the 
domains Dg are all equal to the same set V. Using the inequality ||AAfc(. ,a::fe)|| < 
II Ao(., a::fe)|| and assuming that for every bounded set K, the family of measurable 
functions (||Ao(., a::)||)a;gKn'D is uniformly integrable, the result follows (see Corol¬ 
lary 1). On the other hand, when the domains Dg are not equal to the same set 
22, more developments are needed to prove that the event (3.2) is indeed realized 
w.p.l. This point is addressed in Section 5.2 and the main result of the paper is 
eventually provided in Theorem 3. As opposed to the case of identical domains, the 
difficulty comes from the fact that the inequality ||AAfc(. ,a:fe)|| < ||Ao(.,a:fe)|| holds 
only if Xk G 22, which has no reason to be satisfied in the case of different domains. 
Instead, a solution is to pick some Zk G V close enough to Xk in the sense that 
\\zk — 2 ;fe|| < 2d{xk,T>). Using that A\{s, .) is 1/A-lipschitz continuous for every s, 
one has 

(3.3) PA.(5,a:,)|| < ||Aa,(s, z,)|| + . 

As Zk G 22, the inequality ||Aaj,( ., Zk)\\ < ||Ao(., Zk)\\ can be used and the first term 
in the righthand side of (3.3) can be handled similarly to the previous case where the 
domains Dg were assumed identical. In order to establish that (3.2) is realized w.p.l, 
the remaining task is therefore to provide an estimate of the second term 
latter estimate is provided in Proposition 2 which deeply relies on the mathematical 
developments of Lemma 5.1. 
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In Section 6 , we particularize the algorithm to the case of convex programming. 
The proofs of the section mainly consist in checking the conditions of application of 
the results of Section 5. 

4. Stability and cluster points. The following simple Lemma will be used 
twice. 

Lemma 4.1. Let Assumption 1 hold true. Consider u € 'H, (j) & x € TL, 

A > 0, /3 > 0. Then, for every s pL-a.e., 


(4.1) (^A(s,a;) - (l){s),x-u) > A(1 - /3)||Aa(s, a;)|p - ^||(?i'(s)||^ . 


Proof As {AA(s,a;) — Jx{s,x) — u) > 0 for all s /r-a.e., we obtain 


(Aa(s, x) - (j){s), X -u) > (Aa(s, x) - (l){s),x - Ja(s, x)) 


= X{Ax{s,x) - (j){s),Ax{s,x)) 

= A||AA(s,x)f - A((()(s), Aa(s,x)) . 


Use {a,b) < /3||ap + with a = Ax(s,x) and b = 4>{s), the result is proved. □ 

4.1. Boundedness. The following proposition establishes that the stochastic 
proximal point algorithm is stable whenever Za{2) is non-empty. 

Proposition 1. Let Assumptions 1, 2 hold true. Suppose 2’ai(2) ^ 0 and let 
(x„) be defined by (3.1). Then, 

(i) There exists an event B G T such that P(i3) = 1 and for every uj G B and 
every x* G Za(2), the sequence (||xn(a;) — x*||) converges as n ^ oo. 

(i^) PA„(s,a:„)fd^(s)) < 00 , 

(Hi) For any p gN* such that Za(2p) 7 ^ 0, sup„ E(||x„< 00 . 

Proof Consider u S Za(2), () € S\{u) such that f (fdpi = 0. Choose 0 < /3 < |. 
Note that x„+i = x„ — XnAx„{f,n+i,Xn). We expand 

II ^n+l RII — ll^n IIII 4“ 2Xn (Xt^-I-I Xn, Xn u) X^ || Xt^+i Xn || 


— llXn Il|| 2Xn{Ax„{fn.\-l ^ Xn) , Xn U) -f A^^ || Aa^ (^n-t-1; Xn) |j 
Using Lemma 4.1, for all s /i-a.e., 



Therefore, 


(4.2) ||x„+i - uf < ||x„ - uf - A2(1 - 2/3)||AA„(Cn+i,a:)f 

+ ^ll'/'(^n-Il)r - 2A„((/)(^„+i),Xn - u) . 


Take the conditional expectation of both sides of the inequality: 



where we set c = f ||(^|pd/i. and used / (fdp. = 0. By the Robbins-Siegmund theorem 
(see [33, Theorem 1]) and choosing 0 < /? < ^, we deduce that: 
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(thus, point (ii) is proved), sup„ ]E(||a;„|p) < oo and finally, the sequence (||a;„ — m||^) 
converges almost surely as n —>■ oo. Let Q be a dense countable subset of Za(2). There 
exists B G T such that P(i3) = 1 and for all u G B, aWu G Q, (||a;n(w) —u||) converges. 
Consider oj G B and x* G Za{2). For any e > 0, choose u G Q such that ||a;* — m|| < e 
and define £„ = lim„_,.oo ||a;„(w) — m||. Note that ||a;„(a;) — u|| < ||a;Ti(a;) — x*\\ + e 
thus £u < liminf \\xn{oj) — a;*|| + e. Similarly, \\xn{uj) — a;*|| < ||a;n(a;) — u|| + e thus 
limsup ||a:„(a;) —a;*|| < ^u + e. Finally, limsup ||a;„(a;) —a;*|| < liminf ||a:„(a;) — a;*||+2e. 
As e is arbitrary, we conclude that (||a;„(a;) — a:*||) converges. Point (i) is proved. 

We prove point (iii) by induction. Set u G Za{2p). We have shown above that 
sup„ E(||x„ — up) < 00 . Consider an integer <7 < p such that sup„ E(||x„ — up'^“^) < 
oo. We will show that sup„E(||x„ — up'^) < oo and the proof will be complete. Use 
Equation (4.2) with (3 = \, 


E[||x„+i - up«] < E[(||x„ - up + A^|p(^„+i)p - 2A„((/)(^„+i),x„ 


(4.3) 


E 

ki+k2+k^=q 


q 

ki,k2,k3 


'T'{ki,k2,k3) 

n 


u)n 


where for any k = (fci, ^ 2 , k^) such that fci + ^2 + fca = q, we define 

Tf = (-2)^=3A^+'=3E[||x„ - up'^^ ||</>(e„+i)P"^ mn+i),Xr. - u)'=3]. 

Note that = E[||x„ — ulp"^]. We now prove that there exists a constant c" such 

that for any k p (q, 0,0), \T^\ < c"Ap Consider a fixed value of fc p (( 7 , 0 , 0 ) such 
that ki + k 2 + k^ = q and consider the following cases. 

• If ^3 = 0, then ki < q — 1 and k 2 > 1. In that case, 

|rf|<AP^E(||x„-up'=p/|pp'=^dp 

< Q!A^E(I + \\Xn - Mp'^“^)/ ||<(>P^dli 

where o; is a constant chosen in such a way that A^^ < OfA^ for any 1 < ^2 < (? 
and where we used the inequality < 1 + for any ki < q — 1- The constant 
c' = Q;sup„E(l + ||x„ — up'^“^)/ Ipp^’dp is finite and we have \T^\ < c'Ap 

• If A :3 = 1 and k 2 = 0, then = 0 using that / (j)dpL = 0. 

• In all remaining cases, ki < q—2 and ^2 + ^3 > 2. By the Cauchy-Schwarz inequality, 

\t!\ < 2'=3AP^+'=3E[||x„ -up'=i+'=^|p(^„+i)P'=^+'=^] 

= 2'=^Ap^+'=^E[||x„ - up'=i+'=^]/ |pp'==^+'=^dp. 

Now 2^2 + ks = k 2 + q — ki < 2p and 2fci + ^3 = ki + q — k 2 < fci + P < 2q — 2. 
Using again that sup„E(l + ||x„ — up'^“^) < 00 and f Ipp^’dp < 00 , we conclude 
that there exists an other constant c" > c' such that \T^\ < c"Ap 

We have shown that |j’UiA 2 ,fc 3 )| ^ c"A^ whenever A:i+A: 2 +fc 3 = g and (fci, A: 2 , ^ 3 ) p 
(g, 0, 0). Bounding the rhs of (4.3), we obtain 

E[||x„+i - up«] < E[||x„ - up«] + c"A2 

which in turn implies that sup„E[||x„ — up*^] < 00 . □ 
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4.2. Weak cluster points. For an arbitrary sequence (on, : n G N), we use the 
notation a„ to represent the weighted averaged sequence a^, = ■ 

Recall that a family (/^ : i G I) of measurable functions on E ^ R+ is uniformly 
integrable if 

lim sup / /i d/j, = 0 . 


Definition 4.2. We say that a sequence (u„) € has the property UI if the 
sequence 


ELiA. 


(n e N*) 


is uniformly integrable. 

Assumption 3. The monotone operator A is maximal. 

Note that Assumption 3 is satisfied in Examples 1, 2 and 3 above. 

Theorem 1. Let Assumptions 1-3 hold true and suppose that Za(2) 0. Con¬ 

sider the random sequence (xn) given by (3.1) with weighted averaged sequence (a:„). 
Let G G E be an event such that for almost every to G G, (xn(cv)) has the property 
UI. Then, there exists B G E such that P(i?) = 1 and such that for every uj G BC\G, 
all weak cluster points of the sequence (xn(uj)) belong to zer(A). 

Proof. Denote h\{x) = f A\(s, x)dyi(s) for any X > 0, x G B. We justify the fact 
that h\{x) is well defined. As A is maximal, its domain contains at least one point u G 
TL. For such a point u, there exists cf G S\{u). As Aa(s, .) is -^-Lipschitz continuous, 
||AA(s,a::)|| < ||Aa(s,u)|| + i||a; - u||. Moreover 11^^(5,11)11 < ||Ao(s,u)|| < ||<(i(s)|| 
and since (j) G L^{E,'H, fi), we obtain that A;^(. ,x) G L^{E,'H, fi). This implies that 
hx{x) is well defined for all a; € "H, A > 0. We write 


^n+l — Xnhx^(^Xn) + XnPn-i-1 


where = — Aa„(C n+i,a^n) + h\^(xn) is a J^n-adapted martingale increment se¬ 
quence be., E„(77 „+i) = 0. Note that 


E„||77„+i|P < 


||AA„(s,a;„)|p(i^(s) 


and by Proposition l(ii), it holds that ^ A^E„||7y„+i|p < oo almost surely. As a 
consequence, the J^„-adapted martingale Jf,k<n ^kVk+i converges almost surely to a 
random variable which is finite P-a.e. Along with Proposition 1, this implies that 
there exists an event B G T of probability one such that for any oj G B (1 G, 

(i) converges, 

(ii) (xn(cv)) is bounded, 

(iii) \\^>-ni-En{uj))fd£ is finite, 

(iv) (xn(oj)) has the property UI. 

From now on to the end of this proof, we fix such an w. As it is fixed, we omit the 
dependency w.r.t. w to keep notations simple. We write for instance instead of 
Xn(uj) and what we refer to as constants can depend on uj. 

Let (u,v) G gr(A) and consider (j) G S\{u) such that v = f (fdp.. Denote by e > 0 
an arbitrary positive constant. 

We need some preliminaries. By (i), there exists an integer N = N{e) such that for 
all n> N, \\J2k=N ^kVk-^-iW < e. Define T„(s) = ||AA„(s,a:n)|| and let (T„) represent 
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the corresponding weighted averaged sequence. As (Yn) is uniformly integrable, the 

_(JV) 

same holds for the sequence {Y^ ) defined by 

_ 12k=N ^kYk 
^ n X 

l^k=N 

In particular, there exists a constant c such that 


(4.4) 


sup y n dfl < C. 


Moreover, by [29, Proposition II-5-2], there exists > 0 such that 
\/H e S, n{H) < K, ^ 


Yn d^i<e. 


IH 


Since m({||<(>|| > K}) —>■ 0 as AT —>■ +oo, there exists ATi (depending on e) such that 
for all K > Ki, /.t({||(()|| > K}) < For any such AT, 


(4.5) 


y „ dfi < €. 


j{\m>K} 

Denote vk = /{|| 0 ||>_r-} 4>d^. Note that —>■ u by the dominated convergence theo¬ 

rem. Thus, there exists K 2 such that for all K > K 2 , \\vk — ujj < e. From now on, 
we set K > max{Ki, K 2 ). 

Using an idea from [2], we define a sequence (?/„ : n > N) such that un = xn 
and Un+i = Vn- A„/iA„(a;„) for all n> N. By induction, = Xn - YJiZn ^krik+i- 
In particular, ||y„ — < e. We expand 

II 2/n+l u|| = II2/^ ^11 2 An {Xn ) ■, Vn u) -f || yn-t-1 yn || 

< ||y„ - up - 2Xn{hx^{xn),Xn - u) + 2eA„||/lA„(cCn)|| -b XlWhx^ixnW ■ 
Define 5k,x{x) = /{||0||>i<'} X\^x{s,x)dy,{s) and use Lemma 4.1 with ,5 = 1: 

A 

{hx„{Xn) - VK,Xn “ u) > -\\SK,X„iXn)\\\\Xn “ u||- — 


> —C 




Yndy. - 


XnK^ 


where the constant c is selected in such a way that c > sup„ ||a:„ — u||. Using that 
\\vK - u|| < e, 

Xr,K^ 


{hx„ {Xn) - V,Xn - u) > -cc -c Yndy - 

J{\m>K} 

As a consequence, 

(4.6) ||y„+i - up < ||yn - up - 2X„{v,Xn - u) + r„ 

where we define 


1’n — 2,C6Xn “h ^XnCtn^K H“ 

Sn = WhxAXnW + Ky2 

tn,a= [ Yndfj. {\/a G {0,K}). 
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For any a € {0,iF}, denote 


j{N) _ '^k=N ^ktk,a 
^n,a \ 

L^k^N 

By inequality (4.4), < c. By inequality (4.5), < e. By point (iii), A2||/i;^^(x„)|p < 

oo. Using Assumption 2(i), it follows that 

< 6ce + o„(l) 

l^k=N 

where o„(l) stands for a sequence which converges to zero as n —>■ oo. Summing the 
inequalities (4.6) down to rank N, and dividing by we obtain 


0 < -- 


ELw >^k{v,Xk - u) 


J2k=N ^k 


+ 3ce + (1). 


Let i be a weak cluster point of the weighted averaged sequence Xn- Then, x is also 
a weak cluster point of the sequence 


^kXk 

J2k=N ^k 


We obtain 0 < —{v,x — u) + 3ce. The inequality holds for any e > 0, thus 0 < 
— (u, x—u). As the inequality holds for any (u, v) G gr(A) and A is maximal monotone, 
this means that (i, 0) € gr(A) [9, Theorem 20.21]. □ 

4.3. Weak ergodic convergence. The aim of Theorem 2 below is to merge 
Proposition 1 and Theorem 1 into a weak ergodic convergence result. We need the 
following condition to hold. 

Assumption 4. zer(A) ^ 0 and zer(A) c Za{2). 

The condition zer(A) ^ 0 means that there exists x* G H for which one can find 
a selection (j) of A{. ,x*) such that J (jjdfj, = 0. The condition zer(A) C Za(2) means 
that moreover, such a (j) can be chosen to be square integrable. For instance, this holds 
under the stronger condition that for any zero x* of A, |A(., cc*)] is square integrable. 

Lemma 4.3 (Passty). Let (An) be a non-summable sequence of positive reals, and 
(an) any sequence in H with weighted averaged (a„). Assume there exists a non-empty 
closed convex subset Q of'H such that (i) weak subsequential limits ofon He in Q ; 
and (ii) lim„ ||a„ — &|| exists for all b G Q. Then (a„) converges weakly to an element 
ofQ. 

Proof See [31]. □ 

Theorem 2. Let Assumptions 1-f hold true. Consider the random sequence 
(xn) given by (3.1) with weighted averaged sequence (Xn). Let G G T be an event 
such that for almost every tv G G, (xn(oj)) has the property UI. Then, almost surely 
on G, (xn) converges weakly to a point in zer(A). 

Proof. It is a consequence of Proposition l(i). Theorem 1 and Lemma 4.3. □ 

Theorem 2 establishes the almost sure weak ergodic convergence of the stochas¬ 
tic proximal point algorithm under the abstract condition that w.p.l, (xn) has the 
property UL. We must now provide verifiable conditions under this property indeed 
holds w.p.l. This is the purpose of the next section. 
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5. Main results. 

5.1. Case of a common domain. We first address the case where the domains 
Ds of the operators A(s, .) {s G E) are equal (at least for all s outside a neglible set). 
We also need an additional assumption. 

Assumption 5. For any bounded set K C FL, the family (||Ao(. ,a;)|| : x G KflV) 
is uniformly integrahle. 

Assumption 5 is satisfied if the following stronger condition holds for any bounded 
set K CFC: 

(5.1) 3rK > 0, sup f ||Ao(s, d^{s) < oo. 

xeKnT) J 

Corollary 1. Let Assumptions 1-5 hold true. Assume that the domains Dg 
coincide for all s outside a pi-negligible set. Consider the random sequence (x„) given 
by (3.1) with weighted averaged sequence (x„). Then, almost surely, (x„) converges 
weakly to a zero of A. 

Proof. By Proposition 1 and the fact that Dg ='D for all s /i-a.e., there is a set of 
probability one such that for any to in that set, there is a bounded set K = K^j such 
that Xn{w!) S A n I? for all n S N*. By Assumption 5, the sequence (||Ao(., Xn(a;))|| : 
n G N*) is uniformly integrable. As ||Aa„ (., x„(a;))|| < ||Ao(., x„(a;))||, the same 
holds for the sequence (||Aa„( . ,x„(a;))|| : n G N*) and holds as well for the corre¬ 
sponding weighted averaged sequence. The conclusion follows from Theorem 2. □ 

5.2. Case of distinct domains. We now address the case where the domains 
Eg may vary with s. The case is more involved, because the sole Assumption 5 is not 
sufficient to ensure the convergence. The reason is that the inequality ||Aa„(s,x„)|| < 
II Ao(s, x„)|| used to prove Corollary 1 does no longer hold when Xn^ Eg. Nonetheless, 
using that Aa(s, .) is i-Lipschitz continuous, the argument can be adapted provided 
that the iterates converge “quickly enough” to the essential domain V. The crux of 
the paragraph is therefore to provide estimates of the distance between x„ and the 
set E. To this end, we shall need some regularity conditions on the collection of sets 
Eg. These conditions can be seen as an extension to possibly infinitely many sets of 
the bounded linear regularity condition of Bauschke et al. [8]. 

We define the mapping U : E x H ^ H hy 

n(s,x) = proj^i(^^)(x). 

Note that n(s,x) = liiUAj^o <^a(s, x) by [9, Theorem 23.47]. By Assumption 1, B is 
£ <8) -B('H)/i3(77)-measurable as a pointwise limit of measurable maps. The distance 
between a point x G% and Eg coincides with d{x,Es) = ||x — n(s,x)||. 

Assumption 6. For every M > 0, there exists km > 0 such that for all x GFL 
such that ||x|| < M, 

J d{x,EsYdp{s) > KM d{x,VY . 

The above assumption is quite mild, and is easier to illustrate in the case of finitely 
many sets. Following [8], we say that a finite collection of closed convex subsets 
{Xi,...,Xm) over some Euclidean space is boundedly linearly regular if for every 
M > 0, there exists k'j^ > 0 such that for every ||x|| < M, 

m 

(5.2) max d{x,Xi) > K']^d{x,X) where A = (| 

.. .m. II 
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where implicitely X ^ 0. Sufficient conditions for a collection of set can be found 
in [ 8 ] and reference therein. For instance, the qualification condition niri(Xi) 7 ^ 0 is 
sufficient to ensure that Xi,... ,Xm are boundedly linearly regular, where ri stands 
for the relative interior. 

Now consider the special case of Example 1 i.e., ^ is finitely supported. As¬ 
sume that H is a Euclidean space and that the domains Hi,..., -Dm of the operators 
A(l, .),..., A(m, .) are closed. It is routine to check that Assumption 6 holds if and 
only if Hi,..., Dm are boundedly linearly regular. 

Lemma 5.1. Let Assumptions 1, 2 and 6 hold true. Assume that Xn/Xn+i —^ 1 as 
n —> -boo and H 7 ^ 0. For each n, consider a fFn-iT^eosurable random variable Sn on T-L. 
Assume that the sequence (E„||(5„+i|p) is bounded almost surely and in 
Consider the sequence (xn) given by 

(6.3) Xn.^1 — -b XnSnJ^i . 

Assume that, with probability one, (xn) is bounded. Then 

^k<n d{xk,D) 

sup —=--- <00 a.s. 

" l^k<n 

Proof. Consider an arbitrary point u € D. By definition of H, u S Ds for all s 
/i-a.e. For any /3 > 0, 

||a:„+i - u|p < (1 + /3)||n(e„+i,x„) - ur + A^(l + i)||<5„+if • 

As n(^„+i, .) is firmly non-expansive, 

||a;„+i -wf < (1 + ^) (||a;„ -uf - ||a;„ - n(^„+i,a;„)f) -b A 2(1 -b i)||d„+if . 

The above inequality holds for any u € V and thus for any u G cl(H). It holds 
in particular when substituting u with proj(,i( 25 )(a:„). Remarking that d{xn+i,D) < 
\\xn+i - proj<,i(i,)(a;„)||, it follows that 

d{xn+i,Df < (I +/3) {d{Xn,D)‘^ - \\xn “ n(^„+i, x„) f) -b A^(I -b i)||(5„+i|p . 

Consider a fixed M > 0, and denote by the probability event nfc<ra{||xfe|| < M}. 
Denote by xb the characteristic function of a set B, equal to I on H and to zero 
outside. By Assumption 6 , 

En(||a^n n(^n+l) II ~ W^n II(s,X7i)|| dpi^S^XBf^ 

> Km d{xn,D)‘^XB^ 

where km is the constant defined in Assumption 6 . Define = tn,M as the random 
variable tn = d{xn,D)^XB^ ■ Upon noting that ^ '"^6 obtain 

tn+i < (1 + /3)(I - KM)tn + A^(I + ^)||^n-|-l|P ■ 
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Taking the conditional expectation, < (1+/3)(1 —+ 

Define A„ = tnIXn- Using that A„/A„+i —>• 1 and choosing f3 small enough, there 
exists constants 0<p<l,c>0 and a deterministic integer uq depending on the 
sequence (A„) and the constants /3, km such that for all n > ng, 

(5.4) < p + cE„||(5„+i|p . 

Taking the expectation of both sides and using that (E||5„+ip) is bounded, we obtain 
that the sequence (A„) is uniformly bounded in R+, P). Now consider the sums 

n n 

Tn= ^ tk and ^ Xk ■ 

k—no-\-l k—nQ-\-l 

Decompose T„ = X)fc=no+i ^k-id{xk,V) + where 

n 

= 'y ( (ik Pfc—l^fc) ■ 

k—no + 1 

Note that i?„ is an J^„-adapted martingale and E((ffc — IKk-itk)^) < '^{tD < UA^ for 
some finite constant C = sup„E(A^). As A^ < oo, we deduce that i?„ converges 
a.s. to some r.v. which is finite P-a.e. As a consequence, Rn/<Pn tends a.s. to 
zero. On the other hand, by Jensen’s inequality, 

n 

Tn< Y. + \\Rn\\. 

k—nQ-\-l 

By (5.4) again and the assumption that E„||(5„+i|p is bounded a.s., there exists a 
finite r.v. Z > 0 such that, almost surely, E„(A^_|_;^) < pA^ + cZ. Thus, there exists 
other constants p < pi < 1 and ci such that E„(A^_|_^)^/^ < pi A„ + ci Z. Using that 
Xn/Xn+i —>■ 1, we obtain 

E„(f^_l_]^)^^^ < P2 tn+l + CiA„+i Z 


for some constants pi < p 2 < 1. As a consequence, 

Tn ^ C 2 Z ^ ||J7n|| 

(p„ “ 1 - P2 (1 - P2)‘Pn ’ 

Therefore, for every M > 0, the exist a probability one event on which r„/(p„ is 
bounded. Hence, on a probability one set, for every integer M > 0, the sequence 

Efc<n d{xk,V)XB^ 

Yl,k<n 

is bounded. As {Xn) is bounded w.p.l., the conclusion follows. □ 

Assumption 7. There exist p G N* and C G L^(E,R_|_,p) such that for any 
X gR, \ >Q, 


II JA(s,a;) - n(s,x)|| < AC'(s)(l + ||xP) 


and Za_(2p) ^ 0. 
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We recall that Jx{s,x) converges to the best approximation n(s,a;) of x in Dg 
when A 0. Assumption 7 provides an additional condition on the rate. Loosely 
speaking, the condition means that the resolvent value J\{s^x) should be at distance 
0(A) from the projection n(s,a;). A sufficient condition will be provided in Section 6 
in the case of subdifferentials. 

The second condition Za{2p) ^ 0 means that there exists a zero of A, say x*, for 
which one can find a (2p)-integrable selection (j) € A{., x*) such that / (j)dp = 0. This 
is for instance the case if |A(. is integrable. 

Proposition 2. Let Assumptions 1, 2, 6 and 1 hold true. Suppose that A„/A„+i —>■ 
1 as n —)> oo. Then, the sequence (xn) given by (3.1) satisfies almost surely 

J2k<nd{xk,V) 

sup —=--- < oo . 

" l^k<n 

Proof. The sequence (xn) satisfies (5.3) if we set 

dn+l — {SXn (^n+ 1 5 ^n) ff ('Cn+l 5 ^nf) / An ■ 

By Assumption 7, E„||(5n+i|P < c(l + ||a;„|pP) for some constant c > 0. Therefore, by 
Proposition l(iii), E„||(5„+i|p is uniformly bounded almost surely and in 
The conclusion of Lemma 5.1 applies. □ 

Theorem 3. Let Assumptions 1-7 hold true and let An/A„+i —>-1 as n ^ oo. 
Consider the random sequence (Xn) given by (3.1) with weighted averaged sequence 
(xn)- Then, almost surely, (xn) converges weakly to a zero of A. 

Proof. For every n, choose any point Zn & P such that \\zn — a;„|| < 2d(a;„,X>). 
As Aa(s, .) is -^-Lipschitz continuous, 

||AIa„(s,x„)|| < ||Aa„(s,z„)|| + . 

'^n 

Using moreover that ||Aa„(s, 2 ;„)|| < ||Ao(s, 2 :„)||, 

E n \ \ " ^-^71 \ 

k^l '^k 

By Proposition 2, 

Kk=l^k\\Axds,Xk)\\ ^J2k=l>'k\\Ms,Zk)\\ , 

fO-Oj ^n \ S ^n \ + o 

Z^fe=l Z.^fe=l '^k 

where C is a r.v. independent of n and s and which is finite P-a.e. By Assumption 5, 
the family || Ao(., Zfe(w)) || is uniformly integrable for almost every uj. Thus, the same 
holds for the corresponding averaged sequence, which in turn implies that the func¬ 
tions of s given by the Ihs of (5.5) are uniformly integrable. The conclusion follows 
from Theorem 1. □ 

5.3. Strong monotonicity and strong convergence. We prove the following. 

Theorem 4. Let Assumptions 1, 2 hold true. Assume that for every s € E , 
A(s, .) is strongly monotone with modulus a{s) where a : E ^ R+ is a measurable 
function such that P(a(^i) ^0) >0. Then A is strongly monotone and, as such, 
admits a unique zero x*. If x* € Za( 2) then, almost surely, the sequence (xn) defined 
by (3.1) converges strongly to x*. 
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Proof. Set {x,y) and (x',y') in gr(A). Let (j) and (j)' be integrable selections of 
A{. ,x) and A {., x') respectively such that y = J ffdy- and y' = J 4>'dy,. Then, 

{4>{s) — 4>'{s),x — x') > a(s)||x — x'll^ . 

Integrating over s and noting that / ady > 0 by hypothesis, we deduce that A is 
strongly monotone. Let x* be its unique zero and assume that x* S Za( 2). Note that 
there is no restriction in assuming that «(.) < 1 (otherwise just replace «(.) with 
min(a{.),l)). 

By strong monotonicity, the inequality (4.1) of Lemma 4.1 can be replaced by 

(Aa(s,x) - (t){s),x- u) > a(s)||JA(s,x) - uf + A(1 - /3 )||Aa(s, x)||^ - ^\\4>{s)\\^ ■ 

As a consequence. Equation (4.2) can be replaced by 

(5.6) ||x„+i - x*f < ||x„ - x*f - (1 - 2/3)||AA„(en+i,x„)||2 

- 2A„a(^„+i)||x„+i - x*|p + - 2A„((()(Cn+i),x„ - x*) 

where ^ is a measurable selection of A(., x*) such that f (fdy. = 0. In the sequel, we 
shall simply set /3 = |. On the other hand, by straightforward algebra, 

||Xn+l ^11 — ll^n X II -\- ‘2{Xn+l Xn, Xn X ) 

— II Xn X II 2An ( Aa^ (^n+1; Xn), Xn X ) 

> ||Xn - X*||^ - X„\\Ax„ifn+l,XnW “ An||Xn “ X*||^ 

and by plugging the above inequality into (5.6), using a(.) < 1 and recalling /? = A, 

||Xn+l-X*|P < (l+2A^)||Xn-X*||^-2A„a(^n+l)||Xn-X*|P + 2A^||AA„(.Cn+l,a^n)r 

+ >^lU{in+lW - 2A„(<()(^„+i),Xn - X*) . 

Applying the conditional expectation E„ on both sides, and setting a = f adfj,, and 
14 = 2E„||AA„(Cn+i,Xn)|P + f im^dy, we obtain 

E„(||Xn+l - X*|p) < (1 + 2A^)||Xn - X*|p - 2And||Xn “ X*|P + A^I4 • 

By Proposition l(ii) and the fact that (An) G one has J2n < oo a.s. Therefore, 
by [33], 


A„a||xn — x*||^ < 00 a.s. 

n 


By the standing hypothesis, a > 0, thus An||xn — x*||^ < oo a.s. Since ||xn — x*|| 
converges a.s. by Proposition l(i) and since (An) ^ it follows that ||x„ — x*|| —>■ 0. 

□ 


6. Application to convex programming. 


20 


6.1. Problem and Algorithm. Consider the context of Example 3. Let f : Ex 
T-L —>■ (—oo, +oo] be a normal convex integrand. Denote by F{x) = f f{s, x)d^(x) the 
corresponding integral functional. Identifying df with the operator A of Section 3, 
the resolvent J\ coincides with the proximity operator (s, x) i—prox;^^('g^ .){x) defined 
in (1.2). The iterations (3.1) write 

(6-1) Xn+l = prOX;^„y(^^^^_ ) (Xn) ■ 

The aim is to prove the almost sure weak convergence in average of (a;„) to a minimizer 
of F (assumed to exist). We denote by 9/o(s, x) the element of df{s, x) with smallest 
norm. We denote by V the essential intersection of the sets Dg = dom(5/(s, .)) for 
s € E. 

Assumption 8. 

(i) f : E xT-L ^ (—oo, +oo] is a normal convex integrand. 

(ii) F is proper and lower semicontinuous. 

(Hi) For all x GH, dF{x) = / df{s,x)dp,{s). 

(iv) The set of minimizers of F is non-empty and included in 

Assumption 8(iii) has been discussed in Example 3. 

6.2. Case of a common domain. 

Theorem 5. Let Assumptions 2 and 8 hold true. Assume that the domains Dg 
coincide for all s outside a ^.-negligible set. Assume that for any bounded set K (ZTL, 
the family (||5/o(., a;)|| : x G K nV) is uniformly integrable. Consider the random 
sequence (xn) given by (6.1) with weighted averaged sequence (x„). Then, almost 
surely, (xn) converges weakly to a minimizer of F. 

Proof. We prove that A = df satishes the conditions of Assumptions 1 and 3 
and the conclusion follows from Corollary 1. Operator df{s, .) is maximal monotone 
for any given s G E, see e.g. [9, Theorem 21.2]. For a fixed x G H, df{.,x) is 
measurable, see [36, Corollary 4.6] and [30, Theorem 3] in the infinite dimensional case. 
The proximity operator J\{. ,x) is £/B{TL) measurable, see [35, Lemma 4] (combined 
with [39, Proposition 2] in the infinite dimensional case). Therefore, A = df satisfies 
the conditions in Assumption 1. 

Note that F is a convex function. By Assumption 8(ii) and [9, Theorem 21.2], dF 
is maximal monotone. Using moreover Assumption 8(iii), the condition in Assump¬ 
tion 3 is satisfied. Finally, Assumptions 1-5 are fulfilled and the conclusion follows 
from Corollary 1. □ 

6.3. Case of distinct domains. When domains Dg are possibly distinct, the 
convergence result will follow from Theorem 3. We should therefore verify the con¬ 
ditions under which the latter holds. Checking Assumptions 1-5 follows the same 
lines as in Section 6.2 and is relatively easy. Assumption 6 will be kept as a stand¬ 
ing assumption. The goal is therefore to provide a verifiable condition under which 
Assumption 7 holds. This condition is given as follows. 

Assumption 9. There exists p G N* and C G L^(F,R+,/i) such that for all 
s G E fi-a.e. and all x G dom(9/(s, .)), 

||a/o(s,x)|| <C(s)(l + l|xr) 

and Za{2p) ^ 0. Moreover, dom(9/(s, .)) is closed yi-a.e. 
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In order to verify that the above condition is indeed sufficient to ensure that 
Assumption 7 holds, we need the following lemma. 

Lemma 6.1. Let g : T-L ^ (—oo,+oo] be a proper lower semicontinuous convex 
function. Consider x G TL and A > 0. Let tt be the projection of x onto dom( 5 ). 
Assume that dg('K) ^ 0. Then, ||prox;^g(a;) — 7r|| < 2A||95o(7'‘)|| • 

Proof When x = tt, the result is standard [9, Corollary 23.10] (and the factor 2 
in the inequality can even be omitted). We assume in the sequel that x ^ tt. Define 
j = pTOx^g{x), (fi = dgo{TT) and 

\\y — xll^ 

(6.2) g = argming(7r) + (v?,?/- tt) +- 

yGH ZA 

where H is the half-space {y € TL : {y — tt,x — tt) < 0}. By the Karush-Kuhn-Tucker 
conditions, there exists a > 0 such that Xp = —q -\- x — a{x — tt) along with the 
complementary slackness condition a{q — tt,x — tt) = 0. Now as € dg{TT) and 
{x — j)/X € dg{j), it follows by monotonicity of dg that 

0 < {X(p - X + j,TT - j) 

= {j - 9,7r - j) -f a{x - TT,j - tt). 

As {x — TT,j — tt) < 0, we have 0 < (j — g , tt — j) which in turn implies that ||g — 
ttII < ||g — 7r||. As g € H, it is clear that ||a: — ttH < ||g — a;|| and thus ||g — ttH < 
||g — x\\ + ||x — 7r|| < 2||g — x\\. Putting all pieces together, \\j — 7r|| < 2||g — x\\. 
Recall the identity, q — x = —Xip — a{x — tt). If a = 0, the ||g — a;|| = A||(^|| and 
the conclusion \\j — 7r|| < 2A||g5|| follows. If a > 0, the complementary slackness 
condition yields {q — tt,x — tt) =0. Replacing g by its expression as a function of a, 
this allows to write a = {Xip, tt — a;)/||a; — 7r|p. Hence, q — x = —XPcp where P is an 
orthogonal projection matrix. Therefore, ||g — a;|| < A||(/3|| and again, the conclusion 
||j — 7r|| < 2A||(/7|| follows. □ 

Theorem 6. Let Assumptions 2, 6, 8, and 9 hold true. Suppose that Xn/Xn+i —>■ 
1 as n —> oo. Consider the random sequence (xn) given by (6.1) with weighted averaged 
sequence (xn). Then, almost surely, (xn) converges weakly to a minimizer of F. 

Proof When letting A = df, the conditions in Assumptions 1-4 are fulfilled by 
using the same arguments as in the proof of Theorem 5. Moreover, Assumption 9 
implies that the uniform integrability condition of Assumption 5 holds. To apply 
Theorem 3, it is sufficient to verify the condition of Assumption 7 replacing J\{s, .) 
with prox_s,j(^_ By Lemma 6.1 and using n(s,x) € Ds, the following holds /i-a.e. 

l|proxA/(s..)(a;) -n(s,a;)|| < 2A||a/o(s, n(s, a;))|| 

<2AC(s)(l + ||n(s,a:)||P). 

Let X* be an arbitrary point in T). One has ||n(s,a;)|| < ||a:*|| -I- ||n(s,a:) — n(s,a:*)|| 
where we used the fact that x* = n(s,a;*) for all s /r-a.e. By non-expansiveness of 
n(s, .),||n(s, a^)|| < l|2;*|| + ||a; —Finally, there exists a constant a depending only 
onpandx* such that llprox;^^^^ ^(a;) — n(s,a;)|| < AaC(s)(l-I- ||a;p). The conclusion 
follows from Theorem 3. □ 

6.4. A constrained programming problem. In this section, we provide an 
application example to the case of constrained convex minimization over an finite 
intersection of closed convex sets. 
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Let {Xi ,..., Xm) be a collection of non-empty closed convex subsets of "H = 
where d G N*. We consider the problem 

m 

(6.3) min F{x) w.r.t. x € X where X = ^ Xi 

i=l 

where F{x) = f f{s,x)dfj,{s) for all x £ %. Consider a random sequence (/„) on 
{0,1,...,m} independent of (cfn), with distribution pi = P(J„ = i) for every i G 
(0,1,..., m}. Consider the iterations 


(6.4) 


^n+1 


Pro^A„/(J„+i..)(a^n) if/„+l=0 

proj;^j. [xn) otherwise. 


Let us briefly discuss the algorithm. At each time n, the iteration either consists in 
applying the proximity operator of /(^„+i, .) or a projection. The choice is random, 
the former being applied when the r.v. In+i is zero, the latter being applied otherwise. 
The value po represents the probability that the proximity operator of /(^„+i, .) is 
applied. On the opposite, when In+i > 0, a certain set is further picked at random, 
and projection onto that set is applied. 

Remark 1. Instead of applying either the proximity operator of f{fn+i, ■) or a 
projection, one could think of applying both successively, in the flavor of Passty’s al¬ 
gorithm [31]. Although it is out of the scope of this paper, the corresponding algorithm 
may be analyzed using similar principles. 

Assumption 10. 

(i) The sets Xi,... ,Xm are boundedly linearly regular in the sense of (5.2) and 
X = nXi is non-empty. 

(ii) f : E X Tl ^ M. is a normal convex integrand and f{.,x) is integrable for 
each X GH. 

(in) A solution to (6.3) exists and any solution x* satisfies |i9/( ., a:*)| G L'^(E, R, p). 

(iv) There exists p G N* and a solution x* such that \df{ . ,Xp)| G L^^'(A, R,/r). 

(v) There exists C G L^(i?,R+,/r) such that for any x £ TL, ||9/o(s,a;)|| < 
C(s)(l -I- ||a;p) p-a.e. 

Theorem 7. Let Assumptions 2 and 10 hold. Consider the iterates {xn) given 
by ( 6 . 4 ) with weighted averaged sequence (xn) where the random sequence (In) is is 
defined above. Assume that pi > 0 for all i G (0,1,... ,m} and let A„/An+i —1 as 
n —>• 00 . Then, almost surely, (xn) converges in average to a solution to (6.3). 

Proof We introduce the random sequence = {fn,In) on the set E = E x 
{0,1,..., m} equipped with the corresponding product a-algebra. We denote by u = 
probability distribution of where 5i stands for the Dirac measure 
at i. For all s = (s, i) in E and x £%, define 


/(s, x) = f{s, a;)x{o} (*) + 

1=1 

where xc is the characteristic function of a set C (equal to 1 on that set and zero 
outside) and be is the indicator function of a set C (equal to 0 on that set and -|-oo 
outside). We use the convention 0 x (-I- 00 ) = 0. The iterations (6.4) also write 


Xn+l = PlOXAnMn+l. . ) ■ 
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The rest of the proof consists once again in checking the conditions of application of 
Theorem 3, when E, A are respectively replaced by E, df. 

Checking Assumptions 1, 3 and 4- We first make the following observations. 

(i) / is a normal convex integrand on E x TL ^ M.. 

(ii) As /(., x) is integrable for any x, it follows that E = f f(. ,x) is proper, convex 
and continuous. Since pi > 0 for all i, the integral functional E(x) = / /(■ ,x)dv is 
equal to 

F[x) = poE{x) + ixix) 

where X = As X is a non-empty closed convex set and dom(T') 

follows that E is proper and lower semicontinuous. 

(hi) Let Nc{x) denotes the normal cone of a closed convex set C at point x 
same argument, 

dF{x) = podF{x) + Nx{x). 

Moreover, for any s = (s,*), 

m 

(6.5) df{s, x) = df{s, x)x{o} (i) + ^ Nx^ {x) X{j} (i) 

i=i 

and it follows that 

P m 

df {., x)di' =po I df{., x)dp, + ^ Nxi {x). 

i—1 

By Assumption 10(i), the sets Xi,... ,Xm are linearly regular. By [8, Theorem 
3.6], this implies that = Nx{x). Moreover, as F is everywhere finite, 

Idf{. ,x)dfi = dF{x) by [34]. We conclude that for every x 

(6.6) J df{.,x)dv = dF{x). 

(iv) The minimizers of F are the solutions to (6.3) and vice-versa. In particular, 
F admits minimizers. Let us prove that each minimizer x* belongs to ^0/(2)- By 
Fermat’s rule, 0 G dF{x*). Using successively (6.6) and (6.5), there exists (j) G Sdf{x^) 
and (ui,... ,Um) G Nx^ix*) x • • • x Nx^{x*) such that 0 = po f 4>dp + YnLiPi'^i- 
Define for any (s,i) G E, ^{s,i) = Xb}(*)' Clearly, ^{s,i) G 

df{{s,i),x*) and / (pdiy = 0. By Assumption lO(iii), / < -l-oo. Therefore, 

X* G Zqj{2). 

We have checked that the four conditions in Assumption 8 are fulfilled when / 
and F are respectively replaced by / and F. Now set A = df. Using the same 
arguments as in the proof of Theorem 5, the operator A satisfies the conditions in 
Assumptions 1, 3 and 4. 

Assumption 2 being granted, it remains to check that A = df fulfills Assump¬ 
tions 5, 6 and 7. 

Checking Assumptions 5 and 6. By Equation (6.5), dfo{s,x)x{o}{i) G df{s,x). 
Therefore, 119/o(s,a;)|| < |19/o(s,a:)]]. By Assumption 10(v), the uniform integra- 
bility condition in Assumption 5 is fulfilled. Using the linear regularity of the sets 
Ai,..., Am, Assumption 6 is satisfied when substituting Ds with dom(i9/(s, .)). 


= H, it 
. By the 
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Checking Assumption 7. We finally check that A = df fulfills Assumption 7. 
Let p G N* and x* be defined as in Assumption lO(iv). Following the exact same 
line as above, one can construct (j) such that G df((s,i),x*), J 4>dv = 0 and 

/ < + 00 . Therefore Zg^{2p) ^ 0. Denote by J\{s,x) = ^(x) and 

n(s ,x) the projection of x onto the domain of df(s, . ). For any s = (s,i), one has 
Ja(s, x) — n(s, cc) = 0 if i > 1. When i = 0, J\(s, x) = prox;^^^^ ■^{x) and n(s, a;) = x. 
Thus, -^11 JA(s,a:) — n(s,a:)|| < ||9/o(s, a:)|| which is no larger that C'(s)(l + ||a:||). As 
C is square-integrable, we conclude that the operator A = df fulfills Assumption 7. 

By Theorem 3, the iterates (6.4) almost surely converge weakly in average to a 
zero of dF. As zeroes of dF coincide with solutions to (6.3), the proof is complete. □ 

7. Conclusion. In this paper, we introduced a stochastic proximal point al¬ 
gorithm for random maximal monotone operators and proved the almost sure weak 
ergodic convergence of the algorithm toward a zero of the Aumann expectation of the 
latter random operators. The paper suggests that, by using the concept of random 
monotone operators, it is possible to easily derive stochastic versions of different fixed 
point algorithms and to prove their almost sure convergence. This idea can be ex¬ 
tended to provide stochastic counterparts of other algorithms: the forward-backward 
algorithm which involves both implicit and explicit calls of the operators [9], Passty’s 
algorithm [31] or the Douglas-Rachford algorithm [23]. Other important questions 
include the derivation of convergence rates. Although a complexity analysis of the 
stochastic proximal point algorithm (1.1) seems out of reach in the general setting, 
it would be important to address such an analysis in the special case of convex pro¬ 
gramming (1.3). The paper [28] follows such an approach, in the case where the 
convex function are used explicitely. An interesting perspective would be to extend 
the method to the case to the stochastic proximal point algorithm. An alternative 
is to investigate asymptotic convergence rates, as in [41]. Finally, the relaxation of 
the i.i.d. assumption over the random monotone operators would be an important 
problem in future works. 
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